Recipes for open vocabulary keyword spotting #1428

pkufool · 2023-12-25T04:25:27Z

This is a initial version of decoder for open vocabulary keyword spotting system, the idea is almost the same as the context biasing system we proposed before, I improve the ContextGraph to make users can trade off recall and precision easily.

I also trained some small zipformer models (around 3M parameters) on gigaspeech (for English) and wenetspeech (for Chinese) for keyword spotting purpose, will update the results and models in the following commits soon.

…ords

wangtiance · 2024-01-15T01:44:33Z

Hello, what's the current progress of this PR? Thanks!

pkufool · 2024-01-15T09:48:34Z

Hello, what's the current progress of this PR? Thanks!

Developing the runtime first, see k2-fsa/sherpa-onnx#505 , will clean up this PR soon.

alucassch · 2024-01-27T14:16:02Z

Would it be possible to implement a KWS system using the output from the CTC branch, transforming it into a lattice to utilize Kaldi's decoders? Similar to what is done with kaldi-decoder/faster-decoder.h and kaldi-decoder/decodable-ctc.h? What part of the kaldi code must be implemented in kaldi-decoder in order to achieve that could you give me some direction?

pkufool · 2024-02-01T02:41:31Z

@alucassch I indeed have the plan to use ctc branch, but I think I won't use the kaldi decoders. As for using the kaldi decoders, you can compile the keywords into a lattice, than decode the audios with this lattice (faster decoder is enough, I think), then for each frame (or chunk) you can match the suffix of decoded results with keywords candidates, if matching and the logprob is larger than given threshold the corresponding keyword is triggered. Sorry, I don't have much experience in this direction, so here is just my thought, you can try it yourself.

pkufool · 2024-02-20T07:17:26Z

Here are some results of this PR, you can find more details in the RESULTS.md of each recipe.

English

The positive set is from https://github.com/pkufool/open-commands the negative set is the test set of gigaspeech.

Each metric has two columns, one for original model trained on gigaspeech, the other for finetune model.

small

Commands	FN in positive set	FN in positive set	Recall	Recall	FP in negative set	FP in negative set	False alarm (time / hour) 40 hours	False alarm (time / hour) 40 hours
	original	finetune	original	finetune	original	finetune	original	finetune
All	43/307	4/307	86%	98.7%	1	24	0.025	0.6
Lights on	6/17	0/17	64.7%	100%	1	9	0.025	0.225
Heat up	5/14	1/14	64.3%	92.9%	0	1	0	0.025
Volume down	4/18	0/18	77.8%	100%	0	2	0	0.05
Volume max	4/17	0/17	76.5%	100%	0	0	0	0
Volume mute	4/16	0/16	75.0%	100%	0	0	0	0
Too quiet	3/17	0/17	82.4%	100%	0	4	0	0.1
Lights off	3/17	0/17	82.4%	100%	0	2	0	0.05
Play music	2/14	0/14	85.7%	100%	0	0	0	0
Bring newspaper	2/13	1/13	84.6%	92.3%	0	0	0	0
Heat down	2/16	2/16	87.5%	87.5%	0	1	0	0.025
Volume up	2/18	0/18	88.9%	100%	0	1	0	0.025
Too loud	1/13	0/13	92.3%	100%	0	0	0	0
Resume music	1/14	0/14	92.9%	100%	0	0	0	0
Bring shoes	1/15	0/15	93.3%	100%	0	0	0	0
Switch language	1/15	0/15	93.3%	100%	0	0	0	0
Pause music	1/15	0/15	93.3%	100%	0	0	0	0
Bring socks	1/12	0/12	91.7%	100%	0	0	0	0
Stop music	0/15	0/15	100%	100%	0	0	0	0
Turn it up	0/15	0/15	100%	100%	0	3	0	0.075
Turn it down	0/16	0/16	100%	100%	0	1	0	0.025

large

Commands	FN in positive set	FN in positive set	Recall	Recall	FP in negative set	FP in negative set	False alarm (time / hour)23 hours	False alarm (time / hour)23 hours
	original	finetune	original	finetune	original	finetune	original	finetune
All	622/3994	79/ 3994	83.6%	97.9%	18/19930	52/19930	0.45	1.3

Chinese

The positive set is from https://github.com/pkufool/open-commands the negative set is the test-net set of wenetspeech.

Each metric has two columns, one for original model trained on wenetspeech, the other for finetune model.

small

Commands	FN in positive set	FN in positive set	Recall	Recall	FP in negative set	FP in negative set	False alarm (time / hour)23 hours	False alarm (time / hour)23 hours
	original	finetune	original	finetune	original	finetune	original	finetune
All	426 / 985	40/985	56.8%	95.9%	7	1	0.3	0.04
下一个	5/50	0/50	90%	100%	3	0	0.13	0
开灯	19/49	2/49	61.2%	95.9%	0	0	0	0
第一个	11/50	3/50	78%	94%	3	0	0.13	0
声音调到最大	39/50	7/50	22%	86%	0	0	0	0
暂停音乐	36/49	1/49	26.5%	98%	0	0	0	0
暂停播放	33/49	2/49	32.7%	95.9%	0	0	0	0
打开卧室灯	33/49	1/49	32.7%	98%	0	0	0	0
关闭所有灯	27/50	0/50	46%	100%	0	0	0	0
关灯	25/48	2/48	47.9%	95.8%	1	1	0.04	0.04
关闭导航	25/48	1/48	47.9%	97.9%	0	0	0	0
打开蓝牙	24/47	0/47	48.9%	100%	0	0	0	0
下一首歌	21/50	1/50	58%	98%	0	0	0	0
换一首歌	19/50	5/50	62%	90%	0	0	0	0
继续播放	19/50	2/50	62%	96%	0	0	0	0
打开闹钟	18/49	2/49	63.3%	95.9%	0	0	0	0
打开音乐	17/49	0/49	65.3%	100%	0	0	0	0
打开导航	17/48	0/49	64.6%	100%	0	0	0	0
打开电视	15/50	0/49	70%	100%	0	0	0	0
大点声	12/50	5/50	76%	90%	0	0	0	0
小点声	11/50	6/50	78%	88%	0	0	0	0

large and others

Commands	FN in positive set	FN in positive set	Recall	Recall	FP in negative set	FP in negative set	False alarm (time / hour)23 hours	False alarm (time / hour)23 hours
	original	finetune	original	finetune	original	finetune	original	finetune
large	2429/4505	477 / 4505	46.1%	89.4%	50	41	2.17	1.78
小云小云（clean)	30/100	40/100	70%	60%	0	0	0	0
小云小云（noisy)	118/350	154/350	66.3%	56%	0	0	0	0
你好问问（finetune with all keywords data)	2236/10641	678/10641	79%	93.6%	0	0	0	0
你好问问(finetune with only 你好问问）	2236/10641	249/10641	79%	97.7%	0	0	0	0

lonngxiang · 2024-02-25T01:16:04Z

麻烦问下这有支持python接口使用吗，看文档https://k2-fsa.github.io/sherpa/onnx/kws/pretrained_models/index.html#sherpa-onnx-kws-pre-trained-models不是很清楚

pkufool · 2024-03-01T08:16:08Z

麻烦问下这有支持python接口使用吗，看文档https://k2-fsa.github.io/sherpa/onnx/kws/pretrained_models/index.html#sherpa-onnx-kws-pre-trained-models不是很清楚

已添加，k2-fsa/sherpa-onnx#576

wangtiance · 2024-03-07T10:12:12Z

  --decoder-dim 320 \
  --joiner-dim 320 \
  --num-encoder-layers 1,1,1,1,1,1 \
  --feedforward-dim 192,192,192,192,192,192 \
  --encoder-dim 128,128,128,128,128,128 \
  --encoder-unmasked-dim 128,128,128,128,128,128 \

Are these numbers the result of extensive search, or chosen with some intuition? Thanks!

pkufool · 2024-03-07T11:44:32Z

  --decoder-dim 320 \
  --joiner-dim 320 \
  --num-encoder-layers 1,1,1,1,1,1 \
  --feedforward-dim 192,192,192,192,192,192 \
  --encoder-dim 128,128,128,128,128,128 \
  --encoder-unmasked-dim 128,128,128,128,128,128 \

Are these numbers the result of extensive search, or chosen with some intuition? Thanks!

No, just with some intuition. We are searching for better models also smaller models.

manmanfu · 2024-07-24T08:08:24Z

您好，请问有提供kws微调前的预训练pt模型吗？只看到了onnx的。

pkufool · 2024-07-25T04:19:59Z

您好，请问有提供kws微调前的预训练pt模型吗？只看到了onnx的。

看 results.md, 里面有链接。

KIM7AZEN · 2024-10-16T08:23:06Z

There is a small bug in egs/wenetspeech/ASR/prepare.sh （line164 172 180）
it use a fixed num_splits 1000 , should be pieces=$(find data/fbank/M_split_${num_splits} -name "cuts_M.*.jsonl.gz")

pkufool · 2024-10-16T08:26:24Z

@KIM7AZEN Thanks! Could you make a PR to fix it.

KIM7AZEN · 2024-10-16T08:59:42Z

@KIM7AZEN Thanks! Could you make a PR to fix it.

ok. wait a moment

zhuangweiji · 2024-11-01T13:36:19Z

Chinese

small

Commands FN in positive set FN in positive set Recall Recall FP in negative set FP in negative set False alarm (time / hour) 23 hours False alarm (time / hour) 23 hours

original finetune original finetune original finetune original finetune

All 426 / 985 40/985 56.8% 95.9% 7 1 0.3 0.04

large and others

Commands FN in positive set FN in positive set Recall Recall FP in negative set FP in negative set False alarm (time / hour) 23 hours False alarm (time / hour) 23 hours

--- original finetune original finetune original finetune original finetune

large 2429/4505 477 / 4505 46.1% 89.4% 50 41 2.17 1.78

What do 'small' and 'large and others' mean here? Are they referring to the size of the models or the sizes of different test sets? Why does the larger one seem to perform worse than the smaller one?

pkufool · 2024-11-26T05:57:25Z

@zhuangweiji Test sets, you can see https://github.com/k2-fsa/icefall/blob/master/egs/wenetspeech/KWS/RESULTS.md for more details.

pkufool added 3 commits December 25, 2023 12:18

various fixes to context graph to support kws system and bugs of hotw…

17dab02

…ords

Add gigaspeech kws recipe

44bc60f

Add wenetspeech kws recipe

e257b44

This was referenced Dec 29, 2023

Fix finalize error in ContextGraph implementation. #1443

Closed

Recipes for keyword spotting #1444

Closed

pkufool changed the title ~~[WIP] decoder for open vocabulary keyword spotting~~ [WIP] recipe for open vocabulary keyword spotting Jan 19, 2024

pkufool added 12 commits February 1, 2024 16:27

Commit more scripts for gigaspeech kws recipe

2addc6c

More fixes to gigaspeech recipe

4b33563

Commit more scripts for wenetspeech kws recipe

8b65f41

symbol link export.py

724e387

Minor fixes to CharCtcGraphCompiler

f2f4087

Add wenetspeech run.sh

91f1382

add model export scripts

63c6dd9

Add prepare pinyin

a42d873

Merge with master

afe3b18

Fix wewetspeech prepare.sh

7d91e8b

Minor fixes

8090385

Add results

55e17b2

pkufool changed the title ~~[WIP] recipe for open vocabulary keyword spotting~~ Recipes for open vocabulary keyword spotting Feb 20, 2024

pkufool added 3 commits February 21, 2024 07:54

Minor fixes

f8fbecf

Merge branch 'master' into ov_kws

737125f

Fix black

ddc52d5

pkufool merged commit aac7df0 into k2-fsa:master Feb 22, 2024
73 of 93 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recipes for open vocabulary keyword spotting #1428

Recipes for open vocabulary keyword spotting #1428

pkufool commented Dec 25, 2023 •

edited

Loading

wangtiance commented Jan 15, 2024

pkufool commented Jan 15, 2024

alucassch commented Jan 27, 2024

pkufool commented Feb 1, 2024

pkufool commented Feb 20, 2024

lonngxiang commented Feb 25, 2024 •

edited

Loading

pkufool commented Mar 1, 2024

wangtiance commented Mar 7, 2024

pkufool commented Mar 7, 2024

manmanfu commented Jul 24, 2024

pkufool commented Jul 25, 2024

KIM7AZEN commented Oct 16, 2024

pkufool commented Oct 16, 2024

KIM7AZEN commented Oct 16, 2024

zhuangweiji commented Nov 1, 2024 •

edited

Loading

Chinese

small

large and others

pkufool commented Nov 26, 2024

Recipes for open vocabulary keyword spotting #1428

Recipes for open vocabulary keyword spotting #1428

Conversation

pkufool commented Dec 25, 2023 • edited Loading

wangtiance commented Jan 15, 2024

pkufool commented Jan 15, 2024

alucassch commented Jan 27, 2024

pkufool commented Feb 1, 2024

pkufool commented Feb 20, 2024

English

small

large

Chinese

small

large and others

lonngxiang commented Feb 25, 2024 • edited Loading

pkufool commented Mar 1, 2024

wangtiance commented Mar 7, 2024

pkufool commented Mar 7, 2024

manmanfu commented Jul 24, 2024

pkufool commented Jul 25, 2024

KIM7AZEN commented Oct 16, 2024

pkufool commented Oct 16, 2024

KIM7AZEN commented Oct 16, 2024

zhuangweiji commented Nov 1, 2024 • edited Loading

Chinese

small

large and others

pkufool commented Nov 26, 2024

pkufool commented Dec 25, 2023 •

edited

Loading

lonngxiang commented Feb 25, 2024 •

edited

Loading

zhuangweiji commented Nov 1, 2024 •

edited

Loading